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In this paper we present a method for the regularized solution of nonlin¬ 
ear inverse problems, based on Ivanov regularization (also called method 
of quasi solutions or constrained least squares regularization). This leads 
to the minimization of a non-convex cost function under a norm con¬ 
straint, where non-convexity is caused by nonlinearity of the inverse prob¬ 
lem. Minimization is done by iterative approximation, using (non-convex) 
quadratic Taylor expansions of the cost function. This leads to repeated so¬ 
lution of quadratic trust region subproblems with possibly indefinite Hes¬ 
sian. Thus the key step of the method consists in application of an efficient 
method for solving such quadratic subproblems, developed by Rendl and 
Wolkowicz Biol. We here present a convergence analysis of the overall 
method as well as numerical experiments. 


1 Introduction 

Consider the nonlinear inverse problem of recovering x G X in 

(1) F(x)=-y 

- with a forward operator F : !D(F) C X ^ Y mapping from a Hilbert space X to a 
Banach space Y - from noisy measurements y ^ of y G Y satisfying 

(2) S(y/y^) ^ s 

where S : y^ ^ is a distance measure quantifying the data misfit, for instance 
a norm but possibly also a more involved expression such as the Kullback Leibler 
divergence arising from certain statistic measurement noise models. 
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The regularized solution of Q via the method of quasi solutions (also called Ivanov 
regularization) leads to minimization problems of the form 

( 3 ) Xp G argmin{r^(x) : x G X, ||xp ^ p}, 
where 

r^(x) = S(F(x),y^) 

with an appropriately chosen radius p that plays the role of a regularization parameter 
here, cf., li2l[4}|5}[6l[Hl|9}IIll[^- The selection of p can be done in an a priori fashion if 
the norm of some exact solution x^ to 0 is known. Namely, in that case the choice p = 
||xtp can shown to be optimal. Otherwise, also a noise level dependent a posteriori 
choice of 6 according to Morozov's discrepancy principle, i.e., p = p(6) such that 

( 4 ) 6<r^(xp)<T6 


for some r > 1 fixed independent of the noise level 6, leads to convergence, cf. [Jl. 

In this paper we wish to exploit the obvious relation to trust region subproblems to 
take advantage of an efficient algorithm proposed in [fioll for solving quadratic trust re¬ 
gion subproblems with not necessarily positive semidefinite Hessian - a situation that 
is highly relevant here in view of the fact that the cost functional in Q exhibits poten¬ 
tial nonconvexity due to nonlinearity of the forward operator F and/or the distance 
measure §. 

To this end, we first of all discretize the problem by restriction of minimization to 
finite dimensional subspaces C X and the use of computational approximations 
of the involved operators and norms, which leads to a sequence of finite dimensional 
problems 

(5) e argmin{T^(Xn) : Xn G Xn, ||Xn||n ^ p} 

where 

Tn(^Tl) = STl(FTl,(XTl),y^), 

where Sn : —> R Fn : Xn —Y are approximations, e.g., due to discratization, of S 

and F, respectively. 

We also consider the practically relevant situation that these minimization problems 
are not solved with infinite precision but in an inexact sense, i.e., we will use regular¬ 
ized approximations x^ p G Xn satisfying 


( 6 ) 


IIY^ l|2 

11 p 11 TL 




p and p) ^ min {r^(x 


Xrt CXn 


Xnlln ^ Pl + hn 


and choose p = p(6) such that 

( 7 ) < r^(x^ p) < tS-FtT^ 

with appropriately chosen tolerances y^, y^, y^. A first requirement on these toler¬ 
ances to admit solutions to 0 and 0 is 

(8) y^ ^Oandy^ < (T-1)6-Fy^ 
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We mention in passing that this nonconvex case is also of special interest for Ivanov 
regularization since this is also the situation in which it is in general not equivalent to 
Tikhonov regularization. 


2 Regularizing property of p 

To analyze convergence of p to an exact solution x^ of 0 as 6 ^ 0, we first of 
all state a straightforward monotonicity property for the exact minimizers at fixed 
discretization level n. 


Lemma 2.1. The mapping p i-)- t^(x^ p) with x^ p according to 0 is monotonically decreas¬ 
ing. 

Proof. Since the admissible set is larger for larger radius and we minimize the same 
cost function, the assertion is obvious. □ 

Moroever, we get a uniform bound on the radii chosen according to 0 with x^ p 
satisfying 0, provided the tolerances are chosen appropriately. 

Lemma 2.2. Let 0 and 

(9) nn-^-^^n(Pnxt)-r^(xt) 

hold, where is the orthogonal projection onto Xn,. 

Then p(6) according to 0 satisfies the estimate 

(10) p( 6 ) < llPnx'^ll^ =: pj^ 


Proof. By 0 and 0 , as well as 0 we have, for p = p(6), 
) > I'n(xlp)-Tln > S+Ti; -P 




Tn(Xn,p 




where in the last two inequalities we have used 0 and minimality of x^ (together 


with feasibility of PnX^ for the discretized problem). Thus the assertion follows by 
contraposition in Lemma |2.i| □ 

To prove convergence and convergence rates, we will make use of two general results 
Propositions I2.3I and 


2-4 


see also Theorems 2.4 and 2.7 in 12 for a slightly different 
setting. Here for some fixed maximal noise level 5 , (y^)6£(o,5] is a familiy of data 
satisfying 0 and a family of regularized approximations (defined by 0 

with 0 or by some other regularization method). 


Proposition 2.3. Let F be weakly sequentially closed at y in the sense that 


V('yk)ke]N/ (xk)ke]N : 

(xk ^ X and S(F(xk),'yk) —^ 0 (^nd S(y,yk) —^ 0 ^ y e D(F) and F(x) = y 


3 



and let uniform boundedness 

(12) 3 p > 0 V 6 G ( 0 , 6 ] : ||x^f < p 

as well as convergence of the operator values in the sense of the distance measure S 

(13) S(F(x^),y^) ^0fls6^0 
hold. 

Then there exists a weakly convergent subsequence of (x^)5g(o 5] and the limit x* of every 
weakly convergent subsequence o/(x^)5g(o,6] satisfies Q. 

If for such a weakly convergent subsequence (x^'^)iceN limit x*, ( |i^ can be strengthened 
to 

(14) limsup llx^'^P ^ ||x*||^, 

k—)-cx) 

then we even have strong convergence x^'^ x* as k ^ 00. 

If the solution x^ to 0 is unique then x^ ^ x^ as 6 —> 0 under conditional^ and x^ —> x^ as 
6 —)■ 0 under condition limsup^_j_Q ||x^|p ^ ||x^|p. 

To state convergence with rates to some solution x^ of Q we make use of a varia¬ 
tional source condition 

(15) 3 p G ( 0 , 1 ) Vx G :Br(x'^) 2 (x't',x'^- x) < Pllx't'-xp-F (p(S(F(x't'),F(x))) 

with some radius R > 0 and some index function cp : ]R+ —> ]R+, (i.e. cp is monotoni- 
cally increasing and limt_).o (p(t) = 0 ). Condition ( |i^ is a condition on the smoothness 
of x^ that is the stronger the faster cp decays to zero as t —> 0 . It is, e.g., satisfied with 
S(yi/y2) = illyi ~y2p and (p(t) ~ t if x^ lies in the range of the adjoint of the lin¬ 
earized forward operator (which is typically a smoothing operator) and F' is Lipschitz 
continuous 

x'^ = F'(x'^)*w and Vx G ®r(x'^) : ||F'(x) — F'(x)|| ^ L||x —x|| and L||w|| < 1 . 

By a homogeneity argument for the case of linear F it can be seen that the fastest 
possible decay of (p at zero that gives a reasonable assumption in < |i^ is (p(t) ~ t. 

Proposition 2.4. Let ( |i^ hold and x^ be contained in BR(x'*')/or all 6 G ( 0 , 6 ]. Moreover, 
assume that there exist constants Ci, C2, C3 > 0 such that 

(16) V 6 g( 0 , 6 ]: ^ ||x^P + Ci(p(C 26) andS(F(x^),y^) ^ C36. 

and that S satisfies the generalized triangle inequality 

(17) S(yi/y2) < C 4 (S(yi,y 3 ) -FS(y 2 ,y 3 )) 

for some C4 > 0 and all yi,y2 G Y. 

Then the convergence rate 

||x^-xtp = O((p(C 6 ))fls 6^0 
holds with C = max{C2, C4(C3 -I- 1 )}. 
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Proof. From ( |i^ with x = and ( |i^ , as well as Q, and monotonicity of (p we 
get 

llic^ — + 2[x\x^ — x^) 

^ P||x^ -xt ||2 + Cl (p(C 26) + (p(C4(S(F(#),y^) + 6)). 

□ 


Corollary 2.5. Let ( [ii| l hold and let ^^(5) he defined by 0 , Q, with the discretization 
level n = n(6) and the tolerances 11^(6) F^(5)iT^(6) chosen such that (|^, (|^ and 


(18) 


l|Pn(6)X'''||n(6) - ^ Cyi (piCzS) 

Nl"-|NI^( 6 )^Cy 2 <p(C 26 ) 1 . ^^.6 

T^W-T^( 6 )W < t:i 5 J ■' n( 6 ),p( 6 ) 

iTn(6) < 


2.3 


hold for fixed constants ti,T 2, Cyi, Cy2/ C2 >0 independent of 5 . 

Then ^^(5) converges to x^ subsequentially in the sense of Proposition 
If additionally a variational source condition ( |i^ and the generalized triangle inequality ^vj\ 
holds, then 

ll^n(6),p(6) = 0 ((p(C 5 )) asb^O, 

with C = max{C2, C4 (t + ti + T2 + 1 )}. 

Proof By Lemma 


2.2 


and tfiSll we have 


I|2 < 

ll^n( 6 ),p( 6 )ll ^ 


. t ||2 


+ (Ci,i + Cl 2 )(p(C 26 ). 


Moreover, by Q and <|i8ll we can estimate 


S(F(i 


06 


ti.(6),p(6) 


)ry^) = F 


,61^6 


tx(6),p(6)) ^ 1 'n(^Tx( 6 ),p( 6 ))+^l 5 ^ (^ + ^1 +^2)6- 


□ 


3 Computation of p and of p(6) 

3.1 A Newton type iteration for computing 5 c^ p 

For fixed discretization and noise level n, 6 and fixed radius p < p(6), we approximate 
the nonlinear trust region subproblem Q 

(19) min r^(xn) s.t. ||xn||n ^ P 

XrtCXn 

by a sequence of quadratic subproblems arising from second order Taylor expansion 
of the cost function 

(20) min q’^(xn) s.t. ||xn||^ ^ p 
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with 


(21) qi^(x) = +r^'(x'^)(x-x'^) + lr^"(xi^)(x-xi^)^ 

where is some current iterate. Necessary second order optimality conditions for 
these two minimization problems (in case of ( [2o| ) they will also be sufficient, cf. lIT^h 
are existence of Ap, G IR such that 

(22) T'G(x1p)+^p(^1p/-)tx = 0 

(23) Ap^O, llx^^pll^ < p, Ap(llx^^pll^-p) =0 

(24) Vw G C(x^^p) : Ty'(xlp)w^ ^ 0 

for ( [1^ , where for these simple constraints the critical cone is given by 
fXrt if ||’^rL,pllrL ^ P 

C(Xn,p) = < {x^,pk = {v G Xn : (x^^p,v)n ^ 0 } if llx^^pll^ = p and Ap = 0 
Iklp}"^ = {v e Xn : (x^ p,v)n = 0} if ||x^ pik = p and Ap > 0 


and 


(25) 

(26) 

(27) 


< K)+r: 

A'^+^ ^ 0, lix^ 


(xk(xr'-xk + A^+kxl^+V-)^ = 0 


n J y ^'"TL 

k+1 ||2 / ^ A 

n IItl ^ P' ^ 
// 


nJ 
k+1 I 


l|2 


-p) =0 


Vw G Xn : k (Xn)'^^ + A^^k^lln > 0 


for < 1 ^ with ( |^ . For the error x+^ — x^ p this implies 
(28) 

(TG'klp)+Ap+)(xI;+^ -^Ip) 

= rG'(<p)kJ^^^ -<p) + (-ApX^,p+A’^++k^ +(Ap-A’^+')xk\k 

lc-|-1 


„6'h,,6 u^k+1 

= Tn”kn,p) + 

=: tp + (Ap — A 


n 
k+1 


■Xn,p)+1'n (Vp) 

.k+1 


(xj 


'P 

,6 ' 


( 0 ( 4 +' -x+ + (Ap-A'"+M(xl^+\-) 


‘-p T l./'p /\ )(X^ /■)tI 

where In : X^ ^ R, In(xn,+) = (xn,+)n/ and 
(ry'(xk+A’^+kn)(xl^+i-x^^p) 

= T'G'(4)(4+^ -Op) + (-ApXn,p+A 


(29) 


^++1^+^ + (Ap -A'^+^)x-^ 


// , 


'P 

vk+1 


rL,p/ /ri 


= 1'n''(Xn)(4-Op)+1'n (Op)-^ (4) + (Ap “ A'"+' ) (x^^p, ^ 

=: ++0(Ap-A’"+^)(x^^p,-)n 
where under a Lipschitz condition on " 


(30) 


VXt 


gD(F, 


(x) 


(x)||n < 1-||X-X|| 
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we have 


[tp lln —I 


T- K.D + 0(xlo-<))-1'n i^n.o) dQ{x^„„-X^] 


+ 


n v^''rL,p ' '^v"''rL,p 
b ( b \ 6 ''r k 


^nJ 
.k+1 


n y'^n,pJ 
ki 


Tn K,p)-T'n K) (^1^ -^J\\n 


— llx^ —x^ll^ -l-Lllx^ —x^ 

^9ll^rL,p ^nlln ^ ^ll^rL,p ^• 


■n\\n 


_-v^ 

W^n ^■ 


n\\n 


I -i-kH- 111 _I 

I \\n —I 


0 '- 


^n,p 

^riWn 


{X^,o + 0(^lp-^n))-1'n (^Id) 


n,pJ 


d0(^lp 


Testing the sum of ( [^ and ( [^ with x 


,k+l 


'"TL/p 


and using the fact that 


k+1 w^k+l 


(Ap-A^+')(x, 


,k+1 ||2 


-L X^^^ — X^ ) 

' ^n,p^ ^n,plri 

b ||2 


'pVII-^n, lln “ P) if ll^n^plln, — P 


A^^+ll 


'"rL,p I In 


P) if ll^lp 


In ^ P 


= (Ap-A 


^0 


k+1 


l|2 


n ll^n,pllnJ 


(the latter representation is readily checked by a distinction of the cases ||x 
/ < p) we end up with 


k+1 ||2 _ 


(31) 


(T'n''(xlp)+T'n''(Xn)+A’^^^In)(xl^+^ “ + Ap ||xj^+l - 


// 

' n V '^n 
k ||2 


n,p) 


^n,p IIn 


^ l-(+n,p-Xnlln+lK,p-X 


TLlIri II^tl ^ 


nllnjll^n^^ ^n,pllrL • 


From this we wish to extract an estimate on the error norm — x^ p ||tv However, 

the operator r^^^(x^ p) +r^^^(x]^) +A^^^lrL on the left hand side in ( [^ is positive 
semidefinite only on the critical cone C(x^ p). Thus in the case Ap > 0 in which 
C(x^,p) only consists of directions orthogonal to x^ p, we have to make an additional 
assumption to avoid negative contributions on the left hand side (that generally might 
have even larger modulus than the good term Ap — x^ p ||^) 


(32) 


^n^^(^n,p) + ^pln is positive definite, where Ap = 



Here we have used the fact that the Lagrange multiplier can be explicitely represented 
due to the necessary optimality conditions pi] ) p^ . 

Assumption p^ is obviously satisifed if the cost function is convex, but it also 
admits nonconvexity possibly arising due to nonlinearity of F and/or §, as the follow¬ 
ing example shows. 


Example i. n = 1 , xf > 0 , C e 


D > 


48 Cxt ^-1 

12(1/(2x/C)-xt)2 


>0 


rtW = i(x-xf)^ 


C(x — xf + D min{ 0 , x}"^. 
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(a) (b) (c) (d) 

Figure i: From left to right: (a) function (b) detailed view of (c) Hessian , 

(d) shifted Hessian + Ap (solid) and Lagrange multiplier Ap (dashed) 


It is easy to see that x = 0 cannot be a solution to ( [i^ , thus according to ( [22I ), (|^, a 
solution X has to satisfy 


X G [-xt,xt] \{ 0 } A ^ ^ ^ > 0 

X 

x"^ < X < 0 A (x —x'^)(l — 4 C(x —x"^)^) + 4 Dx^ ^ 0 ^ 
v(x^ >x >0 A (x-x+)(l - 4 C(x-x^)^) ^0) 

^ < X < 0 A 1 — 4 C(x —x^)"^ ^ 0) V ^x^ > X > 0 A 1 — 4 C(x —x^)^ ^ o) 

—x^ < X < x^- \= < 0 Vx^>x >0 

iVc 

where we have used the fact that C < in the last equivalence. Hence we get 

_ ( -y ^ 

X e [-xt,xt] \{ 0 } A —^ 0 

X 

^ r^^^(x) H—= 1 — 12 C(x —x'*')^ + 12 D min{ 0 ,x}^ H— 

X X 

f ^ 1 -48Cxt^ + 12D(^-xt)2 >0 if -xt <x<xt-^ <0 
1 = l(xt-4C(2x3-3xV + xt^)) ^ ^(1-4Cxt^) >0 ifxt>x>0 

where we have used the fact that D > ,t'7 and 4 Cx^^ < 1 . Thus condition 

^2{^/{2VC)—xT]^ 

( [^ is satisfied although is nonconvex. An illustration of this example is provided 
in Figure 

Proposition 3.1. Let r^ be twice Lipschitz continuously differentiable ( |^ and satisfy ( |^ at 
some minimizer x^ p of ( |i^ . 

Then the iterates x^+^ defined as minimizers to ([io]) converge locally quadratically to x^ p. 
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Proof. Let x^ p G argmin{r^(xn) : x^ G Xn, ||^n||^ ^ p}- By finite dimensionality of 
the space Xn., condition ([^ implies existence of ari. > 0 such that 


(33) 


Vw G Xn 





Let the starting point x'^ G Xn be contained in some Cn-ueighborhood (wrt. || • ||n) of 
x^ p with en G ( 0 , ^). Using ( |^ with w = — x^ p in ( |3i] | (recall that r^^^(x]^) + 

In is positive semidefinite on all of Xn by ( [2^ ) for k = 0 by the triangle inequality 
yields 


(V _x^ 11^ <T(2 IIx^ _x*^ 11'^-I-llx^ —x^ll llx^^^_x^ II lllx^^^_x^ II 

'^rL,pllrL^ *-V^ll^rL,p 11 n ^ 11p 11 'tl 11p 11 'tl y 11'^rL,pll 

hence 


(34) 

(35) 

(36) 


_ -v^ II 

ll'^TL '^rL,pll'rL 


< 


2 L 


3 Le 


''rL,p 

l'^rL,p '^tlIItl 


3 L„ , 


OCx] 


1 11 

l^rL,p ^TxlIrL 


k ||2 
n lln 


^TL • 


By an inductive argument using the same estimate for general k, the iterates remain 
in the epp-neighborhood of x^ p (cf. ( [^ ) and satisfy a contraction (cf. ( [^ ) as well as 
a quadratic convergence (cf. ( [^ ) estimate. □ 


We now consider these conditions in more detail in the special case of a Hilbert 
space norm 


(37) 


HV^rV 2 ) = ^\\v^ -Vl] 


for measuring the data discrepancy. The optimality conditions ([22|)-([2^ then in terms 
of the discretized forward operator Yrt read as follows: 


(38) 


or 


Ap = 0 and 


11^ 

^rL,p lln 

.6 


ix;; nlk; ^ p and 


Vw G Xn : (bn(x®,p) -y ,bn(x?L^p)w)n = 0 and 


Vw G Xr 


|bn(xlp)w||n + (bn(x^„)-y^F"(x^„)w^)n ^ 0 


Ap = 


-(fn(x^,p) p)x^,p)n ^ Q llx^^pll^ 

(b) <1 Vw G {x^^pk : (Fn(x^^p)-y^,F(,(x^^p)w)n = 0 and 
Vw G {x^,p}^ 


,pJ 

n'.'''n,pJ 


= p and 


F(.fx^ ^)w||^ + (Fn(x^ )-y^F"(x^ )w^)n >0 
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where in case (a) we use the fact that for all w E Xn either w or —w is in the critical 
cone and in case (b) we have used the fact that for any w E Xtv 


(l'n(Xn,p) -y^,l'n(^lp)w)n + Ap(x^^p,w)n = {Fn{Xn,p ) “ y p )Proj^^6 w)^ 


As a matter of fact this shows that the case (a) of a vanishing Lagrange multiplier is 
not really relevant here, since with Y’n := 3 ?(F^(x^ p)) it implies that Projy^(FrL(x^ p) — 

= 0 so that if UneN Yn = Y we get 


Tn(^lp) = ^l|T'Tx(^lp)-y 


6 

rL,p^ 


= 2l|Projy, 


(T'n(<n)-y^)lln < -y 


iL 


where the last inequality holds for n sufficiently large, provided 6 + Hn-yn > 0, 
which is compatible with the assumptions made in Section]^ On the other hand, from 
([^ and Q it follows that 

T'n( 4 ,p( 6 )) > 5 +nn“Pn 


which by Lemma 2.1 implies p > p( 5 )- 


The positivity condition in (|^ becomes 


(39) VweXn\{0} 




rL,p^ 


)w||n + (Fn(x^„) -y^F"(x^ )■^V^)^^ + A 


w 


> 0 . 


This condition will indeed be satisfied for p = p(6) for 6 sufficiently small and n 
sufficiently large, e.g., in the sitution of an estimate 

Vw G Xn : ||F"(x^,p)w^||n ^ C||F;(x^^p)w||^ 


(which might be interpreted as a condition on the nonlinearity of the forward operator) 
holding with a constant C independent of n, since then for all w G Xn \ { 0 } 

Ill'll (xlp)w||n + (l"n(Xn,p)-y^F"(x^^p)w^)n + Ap||w|P 

> Ill'll(^lp)wlln - +yn)l|l'll(xlp)w^lln + Ap||w|p 

> (1 - (t6+tt|,)C)||F(,(x|,^p)w||^ + Ap||wP >0. 

However, Assumption ( [^ possibly remains valid also in case p is still far away 
from x^ since then the positive contribution of the (then typically larger) Lagrange 
multiplier takes effect: Note that both the residual bTT(^n,p) ~V^ ^^e norm of 

4 - 9 

the ratio get larger for smaller p < ||xT|p. 

Corollary 3.2. Let Y be a Hilbert space, let S, Sn be defined by the squared norm ( |^ and 
its finite dimensional approximation Sn(yi/y2) = ^llyi — tillin' (assume that Fn is twice 
Lipschitz continuously differentiable and that for some minimizer x^^^p of ( |i^ condition ( |^ 
holds. 

Then the iterates x])^+^ defined as minimizers to ( [^ converge locally quadratically to x^ p . 


10 





Remark i. In view of the well-known equivalence between the Levenberg Marquardt 
method and the application of a trust region method to successive quadratic approxi¬ 
mations of the nonlinear least squares cost function, there is an obvious relation to Ij^l, 
still more, since we also use the discrepancy principle for choosing the trust region 
radius (as is done for the regularization parameter cx in |j3|). The main difference to 
the method described here, besides the somewhat more general data space setting, 
lies in the fact that we start from the nonlinear trust region problem and work with 
quadratic approximations of the cost function, that are possibly nonconvex. This is 
why the algorithm from [fioll plays a key role here. 


3.2 Solving the nonconvex quadratic trust region subproblem 

Discretization Xn = with a basis {(t)\..., (j)^} of leads to an equivalent 

formulation of ( [2o| ) as a constrained quadratic minimization problem 

(40) min x^Ax — 2 a^x s.t. x^x < 


with a not necessarily positive semidefinite Hessian A. Such problems can be very 
efficiently solved by the method proposed in [(TolL which we briefly sketch in the fol¬ 
lowing. 

The key idea relies in the fact that solving ( [^ can be recast into a root finding prob¬ 
lem for a parametrized eigenvalue problem. This can be motivated by the following 
chain of inequalities, where p* is the optimal value of ( [^ (see Section 2.2 in Ifioil ). 


(41) 


p* = min x^Ax — 

ll^ll=s,yo=l 

= max min x^Ax — 2yooi^^ + tyo — t 

^ ll^ll=s,yo=i 

T T9 

> max min x Ax — 2yoa x + tyA — t 

> maxminx^Ax —2yoa^x +tyo —1 +A(||x||^ +yo ~ 

t,A x,yo 

= maxminx'^'Ax —2yoa'*'x + ryA —r +A(||x||^ — 

r,A x,'yo 

= maxfmaxminx^Ax —2yoa^x + ryA —r +A(||x||^ ) 

A V r x,yo / 

= maxmin min x^Ax —2yoa^x +A(||xp — s^) = p* 

A X y2_-I 


Indeed, with y 


Vo 

X 


^, the optimization problem on line ( [^ can be rewritten as 


(42) max min y^D(t)y — t = max(s^ + 1 )Ainin(D(t)) — t = maxk(t) 
t lly|P=s^+i t t 


where 


(43) 


D(t) = 


t —a 
-a A 
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and Ainin(M,) denotes the smallest (possibly negative) eigenvalue of some matrix M. 
More precisely, it can be shown (Theorem 14 in iToi '). that unless the so-called hard 

case occurs, for any t G E and for any ( ) being a normalized eigenvector cor¬ 


responding to Aniin(D(t)), the vector x = is well-defined and solves ( |^ with 

s = ^ po(°tp (Here the "hard case" is the pathological situation of a being orthogonal 
to the eigenspace corresponding to Amin (A).) 

Based on this observation, it remains to iteratively find a root of the function r()(t) = 
\/s^”+T — (that can be shown to be almost linear, nonincreasing and concave), 
or equivalently, to find a stationary point of the function k in which can be done 
very efficiently using inverse interpolation, cf. Ifioil . 

The main computational effort of the resulting algorithm lies in the determination 
of an eigenvector corresponding to the smallest eigenvalue of D(t) in each of these 
root finding iterations. For this purpose fast routines exist, (in our numerical tests 
we use the code from http://www.math.uwaterloo.ca/~hwolkowi//henry/software/ 
trustreg.d/ employing the Matlab routine eigs based on an Arnoldi method). Since 
this method only uses matrix vector products with A, it suffices to provide a routine for 
evaluating the linear operators F^(x), F^(x)*(FrL(x) — y^) in a given direction d, which 
particularly makes sense for Yyi being the (discretization of a) forward operator for 
some inverse problem, e.g., some parameter identification problem in a PDF. Namely, 
in that case these actions just correspond to solving the underlying linearized PDE 
model with some inhomogeneity involving d. 


3.3 Newton's method for computing p(6) 

In view of the discrepancy principle Q for choosing p = p(6), we have to approximate 
a root of the one-dimensional function cj) defined by 


(44) 


(jjlp) =Tn(Ap) 


(t+1)6+tT^+ti^ 
where = --- 


which by Lemma [2^ is monotonically decreasing. For this purpose, Newton's method 
is known to converge globally and quadratically, provided (j) is twice continuously 
differentiable. As a matter of fact, in the generic case of x^ p lying on the boundary of 
the feasible set, the derivative of (j) is just the Lagrange multiplier for ( [1^ , since by the 
complementarity condition in 




dp 

= -Ao 


dp 

^ dAp 


m,p I 


-p 


dp dp 


"Tl.,pl 


dx^ 

-P)+Ap«p,^).-Ap 
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by ( [22I ) and ||x^ pp = p. A similar observation has already been made for the deriva¬ 
tive of the cost functional with respect to the regularization parameter in Tikhonov 
regularization cf. iTj. 


3.4 Algorithm 

Altogether we arrive at the following nested iteration. 

Algorithm 1. . 

1: Given 6 > 0 , choose t > 1 , n, ffn/ hn that 0 , 0 , ( [i8| ) are satisfied. 


2 

3 

4 

5 

6 

7 

8 

9 

10 


Po = 0 , Pi > 0 ; 

Setl = l,5c^,p„=0,5^^,„, =0; 


while r^(x^,p,) > t 6 +r|^ 


05 

TL,pl 

do 


6 

TL,pv^ 

V — 0 — Y^ 

K U, '^rL,pi^_i 

while > minx^exjr^(xn) : ||xn||^ ^ pJ + ti^ do 

Compute solution x]^+^ and Lagrange multiplier to ( [^ , < |^ , P = Pi 
Set k = k + 1 
Set 5 crL,pi - 


Xn/ ^Pi = 


Set pi+1 = Pi + 
Set I = 1+1 


Api 


with Td as in ( [^ 


4 Numerical tests 

To illustrate performance of Algorithm we make use of an implementation of the 
method described in [fioil available on the web page of one of the authors http:// 
WWW.math.uwaterloo.ca/~hwolkowi//henry/software/trustreg.d/ and consider the 
nonlinear integral equation 


y(t) = 


(x(s) + lOx(s)^) ds t G ( 0 , 1 ) 


0 


with X = Y = L^( 0 , 1 ) and S(yi,y2) = J\\v^ The integral equation is dis¬ 

cretized with a composite trapeziodal rule on an equidistant grid with 100 breakpoints. 

Figure shows the noisy data as well as the exact and reconstructed solutions with 
different noise levels. In figure]^ we plot error and residual, as well as the number of 
solved nonconvex quadratic subproblems over the radius, for noise levels 1 and 3 per 
cent, during the iteration over p. Figure illustrates that condition ( [^ remains valid 
throughout the iteration over p, while several nonconvex trust region subproblems 
have to be solved (the number being smaller for large 6 since less iterations are carried 
out in that case). 
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6 = 3 % 


6 = 1 % 


6 = 0 . 1 % 


Figure 2: Noisy data (top row) and reconstructions (bottom row) for 6 = 3 % (left col¬ 
umn), 6 = 1% (middle column), 6 = 0.1% (right column) 
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Figure 3: Residuals (top row) and errors (bottom row) for 6 = 3 % (left column) and 
6 = 1% (right column). 
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p p 


Figure 4: Smallest eigenvalue of Hessian of Lagrangian (top row) and number of non- 
convex quadratic subproblems (bottom row) for 6 = 3 % (left column) and 
6 = 1% (right column). 
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